Generalised Additive Models (GAMs) are incredibly flexible tools that have found particular application in the analysis of time series. In ecology, a host of recent papers and workshops (i.e. the 2018 Ecological Society of America workshop on GAMs that was hosted by Eric Pedersen, David L. Miller, Gavin Simpson, and Noam Ross) have drawn special attention to the wide range of applications that GAMs have for addressing complex ecological problems. Given the many ways that GAMs can model temporal data, it is tempting to use the smooth functions estimated by a GAM to produce out of sample forecasts. Here we will inspect the behaviours of smoothing splines when extrapolating to data outside the range of the training data to examine whether this can be useful in practice.
Here we will work in the mvgam package, which fits dynamic GAMs using MCMC sampling via the JAGS software (Note that JAGS 4.3.0 is required; installation links are found here). Briefly, assume \(\tilde{\boldsymbol{y}}_{t}\) is the conditional expectation of a discrete response variable \(\boldsymbol{y}\) at time \(\boldsymbol{t}\). Asssuming \(\boldsymbol{y}\) is drawn from an exponential distribution (such as Poisson or Negative Binomial) with a log link function, the linear predictor for a dynamic GAM is written as:
\[log(\tilde{\boldsymbol{y}}_{t})={\boldsymbol{B}}_0+\sum\limits_{i=1}^I\boldsymbol{s}_{i,t}\boldsymbol{x}_{i,t}+\boldsymbol{z}_{t}\,,\] Here \(\boldsymbol{B}_{0}\) is the unknown intercept, the \(\boldsymbol{s}\)’s are unknown smooth functions of the covariates (\(\boldsymbol{x}\)’s) and \(\boldsymbol{z}\) is a dynamic latent trend component. Each smooth function \(\boldsymbol{s}_{i}\) is composed of spline like basis expansions whose coefficients, which must be estimated, control the shape of the functional relationship between \(\boldsymbol{x}_{i}\) and \(log(\tilde{\boldsymbol{y}})\). The size of the basis expansion limits the smooth’s potential complexity, with a larger set of basis functions allowing greater flexibility. Several advantages of GAMs are that they can model a diversity of response families, including discrete distributions (i.e. Poisson or Negative Binomial) that accommodate ecological features such as zero-inflation, and that they can be formulated to include hierarchical smoothing for multivariate responses. For the dynamic component, in its most basic form we assume a random walk with drift:
\[\boldsymbol{z}_{t}=\phi+\boldsymbol{z}_{t-1}+\boldsymbol{e}_{t}\,,\] where \(\phi\) is an optional drift parameter (if the latent trend is assumed to not be stationary) and \(\boldsymbol{e}\) is drawn from a zero-centred Gaussian distribution. This model is easily modified to include autoregressive terms, which mvgam accomodates up to order = 3.
#AirPassengers example First we load the AirPassengers data from the forecast package and convert to an xts object. This series is a good starting point as it should be highly forecastable given its stable seasonal pattern and nearly linear trend
# devtools::install_github('nicholasjclark/mvgam')
library(mvgam)
library(dplyr)
library(xts)
library(forecast)
data("AirPassengers")
series <- xts::as.xts(floor(AirPassengers))
colnames(series) <- c("Air")
View the raw series, its STL decomposition and the distribution of observations. There is a clear seasonal pattern as well as an increasing trend over time for this series, and the distribution shows evidence of a skew suggestive of overdispersion
plot(series)
plot(stl(series, s.window = "periodic"))
hist(series)
Next use the series_to_mvgam function, which converts ts or xts objects into the correct format for mvgam. Here we set train_prop = 0.75, meaning we will include ~ 75% of the observations in the training set and use the remaining ~25% for forecast validation. We also randomly set ~10% of observations to NA so that we can evaluate models based on their predictive abilities
series[sample(1:length(series), floor(length(series)/10),
F)] <- NA
fake_data <- series_to_mvgam(series, freq = 12,
train_prop = 0.75)
Examine the returned object
head(fake_data$data_train)
head(fake_data$data_test)
As a first pass at modelling this series, we will fit a GAM that includes a seasonal smooth and a yearly smooth, as well as their tensor product interaction. As the seasonality is cyclic, we will use a cyclic cubic regression spline for season. The knots are set so that the boundaries of the cyclic smooth match up between December 31 and January 1. We will stick with the default thin plate regression spline for year. This is similar to what we might do when fitting a model in mgcv to try and forecast ahead, except here we also have an explicit model for the residual component. mvgam uses the jagam function from the mgcv package to generate a skeleton JAGS file and updates that file to incorporate any dynamic trend components (so far limited to no trend, random walks or AR models up to order 3). This is advantageous as any GAM that is allowed in mgcv can in principle be used in mvgam to fit dynamic linear models with smooth functions for nonlinear covariate effects. For multivariate series, mvgam also includes an option for dimension reduction by inducing trend correlations via a dynamic factor process. Here we use the Negative Binomial family and a burnin length of 2000 iterations (in practice we would probably run these models for a bit longer, but they tend to converge rather quickly for univariate series thanks to the useful starting values supplied by jagam). Note that feeding the data_test object does not mean that these values are used in training of the model. Rather, they are included as NA so that we can automatically create a forecast from the posterior predictions for these observations. This is useful for plotting forecasts without needing to run new data through the model’s equations later on
mod1 <- mvjagam(data_train = fake_data$data_train,
data_test = fake_data$data_test, formula = y ~
s(season, bs = c("cc"), k = 12) +
s(year, k = 5) + ti(season, year,
bs = c("cc", "tp"), k = c(12,
3)), knots = list(season = c(0.5,
12.5)), family = "nb", trend_model = "None",
burnin = 2000)
## Compiling rjags model...
## Starting 2 rjags simulations using a PSOCK cluster with 2 nodes on host
## 'localhost'
## Simulation complete
## Note: Summary statistics were not produced as there are >50 monitored
## variables
## [To override this behaviour see ?add.summary and ?runjags.options]
## FALSEFinished running the simulation
## NOTE: Stopping adaptation
We can view the JAGS model file that has been generated to see the additions that have been made to the base gam model. If the user selects return_jags_data = TRUE when calling mvjagam, this file can be modified and the resulting jags_data object can also be modified to fit more complex Bayesian models. Note that here the AR and phi terms have been set to zero as this model does not include a dynamic trend component
mod1$model_file
## [1] "model {"
## [2] ""
## [3] "## GAM linear predictor"
## [4] "eta <- X %*% b"
## [5] ""
## [6] "## Mean expectations"
## [7] "for (i in 1:n) {"
## [8] " for (s in 1:n_series) {"
## [9] " mu[i, s] <- exp(eta[ytimes[i, s]] + trend[i, s])"
## [10] " }"
## [11] "}"
## [12] ""
## [13] ""
## [14] "## State space trend estimates"
## [15] "for(s in 1:n_series) {"
## [16] " trend[1, s] <- 0"
## [17] "}"
## [18] ""
## [19] "for(s in 1:n_series) {"
## [20] " trend[2, s] <- 0"
## [21] "}"
## [22] ""
## [23] "for(s in 1:n_series) {"
## [24] " trend[3, s] <- 0"
## [25] "}"
## [26] ""
## [27] "for (i in 4:n){"
## [28] " for (s in 1:n_series){"
## [29] " trend[i, s] <- 0"
## [30] " }"
## [31] "}"
## [32] ""
## [33] "## AR components"
## [34] "for (s in 1:n_series){"
## [35] " phi[s] <- 0"
## [36] " ar1[s] <- 0"
## [37] " ar2[s] <- 0"
## [38] " ar3[s] <- 0"
## [39] " tau[s] ~ dgamma(0.01, 0.001)"
## [40] "}"
## [41] ""
## [42] "## Negative binomial likelihood functions"
## [43] "for (i in 1:n) {"
## [44] " for (s in 1:n_series) {"
## [45] " y[i, s] ~ dnegbin(rate[i, s], r)"
## [46] " rate[i, s] <- ifelse((r / (r + mu[i, s])) < min_eps, min_eps,"
## [47] " (r / (r + mu[i, s])))"
## [48] " }"
## [49] "}"
## [50] "r ~ dexp(0.001)"
## [51] ""
## [52] "## Posterior predictions"
## [53] "for (i in 1:n) {"
## [54] " for (s in 1:n_series) {"
## [55] " ypred[i, s] ~ dnegbin(rate[i, s], r)"
## [56] " }"
## [57] "}"
## [58] ""
## [59] " ## Parametric effect priors CHECK tau=1/54^2 is appropriate!"
## [60] "for (i in 1:1) { b[i] ~ dnorm(0, 0.1) }"
## [61] " ## prior for s(season)... "
## [62] " K1 <- S1[1:10,1:10] * lambda[1] "
## [63] " b[2:11] ~ dmnorm(zero[2:11],K1) "
## [64] " ## prior for s(year)... "
## [65] " K2 <- S2[1:4,1:4] * lambda[2] + S2[1:4,5:8] * lambda[3]"
## [66] " b[12:15] ~ dmnorm(zero[12:15],K2) "
## [67] " ## prior for ti(season,year)... "
## [68] " K3 <- S3[1:20,1:20] * lambda[4] + S3[1:20,21:40] * lambda[5]"
## [69] " b[16:35] ~ dmnorm(zero[16:35],K3) "
## [70] " ## smoothing parameter priors CHECK..."
## [71] " for (i in 1:5) {"
## [72] " lambda[i] ~ dexp(0.05)"
## [73] " rho[i] <- log(lambda[i])"
## [74] " }"
## [75] "}"
Inspect the summary from the model, which is somewhat similar to an mgcv model summary with extra information about convergences for unobserved parameters. The estimated degrees of freedom, smooth coefficients and smooth penalties are all extracted from the mvgam model using sim2jam so that approximate p-values can be calculated using Nychka’s method (following Wood (2013) Biometrika 100(1), 221-228). Note however that this function is still in development and approximate p-values may not be entirely accurate
summary_mvgam(mod1)
## GAM formula:
## y ~ s(season, bs = c("cc"), k = 12) + s(year, k = 5) + ti(season,
## year, bs = c("cc", "tp"), k = c(12, 3))
##
## Family:
## Negative Binomial
##
## N series:
## 1
##
## N observations per series:
## 108
##
## Dispersion parameter estimate:
## 2.5% 50% 97.5% Rhat n.eff
## r 1195.714 2911.724 6539.513 1 3187
##
## GAM smooth term approximate significances:
## edf Ref.df F p-value
## s(season) 9.255 10.000 38.288 <2e-16 ***
## s(year) 3.047 4.000 0.002 0.925
## ti(season,year) 12.191 20.000 0.009 0.837
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## (Intercept) 5.356114926 5.371177e+00 5.38607078 1.00 1951
## s(season).1 -0.284030406 -2.017047e-01 -0.11765921 1.06 139
## s(season).2 -0.165565944 -1.095047e-01 -0.05773027 1.06 326
## s(season).3 -0.051508567 9.861203e-03 0.06945783 1.04 294
## s(season).4 -0.072643285 -2.087583e-02 0.03211649 1.01 323
## s(season).5 0.022705398 7.261906e-02 0.12013348 1.01 327
## s(season).6 0.184389940 2.313444e-01 0.27764457 1.07 260
## s(season).7 0.166644456 2.096020e-01 0.25212156 1.01 454
## s(season).8 0.008199819 6.143072e-02 0.11672200 1.01 234
## s(season).9 -0.200509298 -1.441863e-01 -0.08583229 1.03 214
## s(season).10 -0.205167198 -1.143819e-01 -0.03263878 1.04 128
## s(year).1 -0.092397704 -3.122586e-02 0.02674706 1.11 134
## s(year).2 -0.095504981 6.913020e-03 0.15477217 1.56 31
## s(year).3 -0.093434635 2.627989e-02 0.20641565 1.62 25
## s(year).4 0.319064872 3.744268e-01 0.42957180 1.04 141
## ti(season,year).1 -0.079408299 3.949175e-02 0.13853610 1.05 119
## ti(season,year).2 -0.144519698 -4.305631e-02 0.03813943 1.04 161
## ti(season,year).3 -0.025247614 7.294316e-02 0.16647049 1.03 157
## ti(season,year).4 -0.144451115 -5.410482e-02 0.02265125 1.01 192
## ti(season,year).5 -0.067702130 3.176347e-02 0.13540368 1.03 152
## ti(season,year).6 -0.100221659 -2.027803e-02 0.06633519 1.05 209
## ti(season,year).7 -0.145725496 -1.638821e-02 0.10052385 1.09 106
## ti(season,year).8 -0.088372985 -6.561229e-03 0.07012538 1.02 231
## ti(season,year).9 -0.119983938 -2.825693e-02 0.06454812 1.01 167
## ti(season,year).10 -0.049848476 3.173896e-02 0.10659279 1.02 192
## ti(season,year).11 -0.132742921 -4.113133e-02 0.04239211 1.10 169
## ti(season,year).12 -0.024146189 4.274241e-02 0.12000655 1.08 237
## ti(season,year).13 -0.116647484 -2.427482e-02 0.06360194 1.04 171
## ti(season,year).14 -0.037651084 3.092864e-02 0.10047571 1.00 274
## ti(season,year).15 -0.092633998 1.908019e-03 0.09921430 1.00 176
## ti(season,year).16 -0.080754013 5.097813e-03 0.07858999 1.00 225
## ti(season,year).17 -0.102102877 -6.109331e-05 0.09538039 1.01 171
## ti(season,year).18 -0.092841274 -1.365255e-02 0.07374290 1.03 207
## ti(season,year).19 -0.073436270 1.885933e-02 0.12410420 1.07 176
## ti(season,year).20 -0.123996965 -1.556987e-02 0.07718077 1.06 144
##
## GAM smoothing parameter (rho) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## s(season) 3.3705218 4.400740 5.179678 1.00 817
## s(year) 1.5724993 3.338635 4.548089 1.01 442
## s(year)2 -0.1849781 2.297352 3.648062 1.00 3624
## ti(season,year) 3.4916393 4.541409 5.307331 1.00 1300
## ti(season,year)2 2.5130823 3.861262 4.865849 1.00 1113
##
## Latent trend drift (phi) and AR parameter estimates:
## 2.5% 50% 97.5% Rhat n.eff
## phi 0 0 0 NaN 0
## ar1 0 0 0 NaN 0
## ar2 0 0 0 NaN 0
## ar3 0 0 0 NaN 0
##
mvgam takes a fitted gam model and adapts the model file to fit in JAGS, with possible extensions to deal with stochastic trend components and other features. But as this model has not been modified much from the original gam model with the same formula (i.e. we have not added any stochastic trend components to the linear predictor yet, which is the main feature of mvgam), the summary of the unmodified gam model is also useful and fairly accurate
summary(mod1$mgcv_model)
##
## Family: Negative Binomial(431819802.287)
## Link function: log
##
## Formula:
## y ~ s(season, bs = c("cc"), k = 12) + s(year, k = 5) + ti(season,
## year, bs = c("cc", "tp"), k = c(12, 3))
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.390904 0.006994 770.7 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(season) 7.562 10 35.358 <2e-16 ***
## s(year) 1.000 1 2438.785 <2e-16 ***
## ti(season,year) 1.464 20 0.188 0.0639 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.989 Deviance explained = 98.9%
## fREML = 135.22 Scale est. = 1 n = 99
Ordinarily we would be quite pleased with this result, as we have explained most of the variation in the series with a fairly simple model. We can plot the estimated smooth functions and with their associated credible intervals, which are interpreted similarly to mgcv plots except they are not centred at zero (mvgam predicts the smooth with all other covariates at their means, rather than at zero). So the plot below, for the season smooth, shows the estimated values for the linear predictor (on the log scale in this case) if year was set to its mean
plot_mvgam_smooth(mod1, series = 1, smooth = "season")
And here is the plot of the smooth function for year, which has essentially estimated a straight line
plot_mvgam_smooth(mod1, series = 1, smooth = "year")
Perform a series of posterior predictive checks to see if the model is able to simulate data for the training period that looks realistic and unbiased. First, examine simulated kernel densities for posterior predictions (yhat) and compare to the density of the observations (y)
plot_mvgam_ppc(mod1, series = 1, type = "density",
legend_position = "bottomright")
Now plot the distribution of predicted means compared to the observed mean
plot_mvgam_ppc(mod1, series = 1, type = "mean")
Next examine simulated empirical Cumulative Distribution Functions (CDF) for posterior predictions (yhat) and compare to the CDF of the observations (y)
plot_mvgam_ppc(mod1, series = 1, type = "cdf")
Finally look for any biases in predictions by examining a Probability Integral Transform (PIT) histogram. If our predictions are not biased one way or another (i.e. not consistently under- or over-predicting), this histogram should look roughly uniform
plot_mvgam_ppc(mod1, series = 1, type = "pit")
All of these plots indicate the model is well calibrated against the training data, with no apparent pathological behaviors exhibited. Now for some investigation of the estimated relationships and forecasts. We can also perform residual diagnostics using randomised quantile (Dunn-Smyth) residuals. These look reasonable overall, though there is some autocorrelation left in the residuals for this series
plot_mvgam_resids(mod1, series = 1)
Ok so the model is doing well when fitting against the training data, but how are its forecasts? The yearly trend is being extrapolated into the future, which controls most of the shape and uncertainty in the forecast. We see there is a reasonable estimate of uncertainty and the out of sample observations (to the right of the dashed line) are all within the model’s 95% HPD intervals
plot_mvgam_fc(mod1, series = 1, data_test = fake_data$data_test)
The extrapolation of the smooth for year can be better viewed by feeding new data to the plot_mvgam_smooth function. Here we feed values of year to cover the training and testing set to see how the extrapolation would continue into the future. The dashed line marks the end of the training period
plot_mvgam_smooth(mod1, series = 1, "year",
newdata = expand.grid(year = seq(min(fake_data$data_train$year),
max(fake_data$data_test$year), length.out = 500),
season = mean(fake_data$data_train$season),
series = unique(fake_data$data_train$series)))
abline(v = max(fake_data$data_train$year),
lty = "dashed")
We can also re-do the posterior predictive checks, but this time focusing only on the out of sample period. This will give us better insight into how the model is performing and whether it is able to simulate realistic and unbiased future values
plot_mvgam_ppc(mod1, series = 1, type = "density",
data_test = fake_data$data_test, legend_position = "bottomright")
plot_mvgam_ppc(mod1, series = 1, type = "mean",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod1, series = 1, type = "cdf",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod1, series = 1, type = "pit",
data_test = fake_data$data_test)
There are some very clear problems with the way this model is generating future predictions. Perhaps a different smooth function for year can help? Here we fit our original model again but use a different smooth term for year to try and capture the long-term trend (using B splines with multiple penalties, following the excellent example by Gavin Simpson about extrapolating with smooth terms). This is similar to what we might do when trying to forecast ahead from a more wiggly function, as B splines have useful properties by allowing the penalty to be extended into the range of values we wish to predict (in this case, the years in data_test)
mod2 <- mvjagam(data_train = fake_data$data_train,
data_test = fake_data$data_test, formula = y ~
s(season, bs = c("cc"), k = 12) +
s(year, bs = "bs", m = c(3, 2,
1, 0)) + ti(season, year,
bs = c("cc", "tp"), k = c(12,
3), m = c(2, 1)), knots = list(season = c(0.5,
12.5), year = c(min(fake_data$data_train$year) -
1, min(fake_data$data_train$year),
max(fake_data$data_train$year), max(fake_data$data_test$year))),
family = "nb", trend_model = "None",
burnin = 2000)
## Compiling rjags model...
## Starting 2 rjags simulations using a PSOCK cluster with 2 nodes on host
## 'localhost'
## Simulation complete
## Note: Summary statistics were not produced as there are >50 monitored
## variables
## [To override this behaviour see ?add.summary and ?runjags.options]
## FALSEFinished running the simulation
## NOTE: Stopping adaptation
Again as we haven’t modified the base gam much, the summary from the mgcv model is fairly accurate
summary_mvgam(mod2)
## GAM formula:
## y ~ s(season, bs = c("cc"), k = 12) + s(year, bs = "bs", m = c(3,
## 2, 1, 0)) + ti(season, year, bs = c("cc", "tp"), k = c(12,
## 3), m = c(2, 1))
##
## Family:
## Negative Binomial
##
## N series:
## 1
##
## N observations per series:
## 108
##
## Dispersion parameter estimate:
## 2.5% 50% 97.5% Rhat n.eff
## r 1254.066 3048.139 6627.217 1 1967
##
## GAM smooth term approximate significances:
## edf Ref.df F p-value
## s(season) 8.361 10.000 42.402 <2e-16 ***
## s(year) 2.185 9.000 0.047 0.5173
## ti(season,year) 1.234 10.000 0.695 0.0336 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## (Intercept) 5.3573412431 5.3722575563 5.386727331 1.01 936
## s(season).1 -0.2023611555 -0.1481174097 -0.100292251 1.00 275
## s(season).2 -0.1478934343 -0.0989617215 -0.056829255 1.01 317
## s(season).3 -0.0339368808 0.0127561294 0.057141785 1.00 357
## s(season).4 -0.0513820018 -0.0060136025 0.036371707 1.06 364
## s(season).5 0.0393576427 0.0835389142 0.126987565 1.07 342
## s(season).6 0.1989298804 0.2394621035 0.282909417 1.01 363
## s(season).7 0.1768111649 0.2159579515 0.252825756 1.00 426
## s(season).8 0.0202391819 0.0654759327 0.107927746 1.01 349
## s(season).9 -0.1683021382 -0.1202322925 -0.077749464 1.01 327
## s(season).10 -0.1517717727 -0.1026729721 -0.058015255 1.02 353
## s(year).1 -1.1377973050 -0.8975819299 -0.026501597 1.18 14
## s(year).2 -0.3960929029 -0.2936355368 -0.262597561 1.26 61
## s(year).3 -0.1795695059 -0.0506799557 0.004911777 1.08 22
## s(year).4 0.1473750204 0.2062242441 0.264553395 1.01 51
## s(year).5 0.2648795702 0.3959487479 0.454608616 1.05 24
## s(year).6 0.5543585857 0.6641294627 0.722182940 1.02 41
## s(year).7 0.7047598961 0.7721138541 0.824340121 1.02 54
## s(year).8 0.6086493544 0.8630888519 1.003121509 1.24 35
## s(year).9 0.8804221748 1.1063252933 1.492086655 1.70 17
## ti(season,year).1 -0.0631672210 -0.0198073573 -0.001712413 1.53 10
## ti(season,year).2 -0.0541727300 -0.0170307121 0.001009253 1.28 13
## ti(season,year).3 -0.0254307323 -0.0051719607 0.017235738 1.21 46
## ti(season,year).4 -0.0066642022 0.0088860841 0.045160511 1.87 31
## ti(season,year).5 -0.0001614933 0.0185344846 0.061498367 1.91 16
## ti(season,year).6 0.0027205620 0.0268985405 0.062152937 1.62 14
## ti(season,year).7 -0.0014288456 0.0228976442 0.055568654 1.53 9
## ti(season,year).8 -0.0091378482 0.0143473567 0.050978002 1.51 27
## ti(season,year).9 -0.0187402516 0.0007670589 0.017060745 1.26 37
## ti(season,year).10 -0.0283999319 -0.0070577090 0.008904759 2.35 27
##
## GAM smoothing parameter (rho) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## s(season) 5.408829 6.583573 7.602904 1.01 489
## s(year) 5.151753 9.071822 11.183027 1.27 10
## s(year)2 -1.263879 1.196932 2.569996 1.00 2767
## s(year)3 -10.410677 -7.174392 -5.469394 1.00 3231
## ti(season,year) 8.160301 10.108173 11.467487 1.29 65
## ti(season,year)2 -12.926516 -9.603458 -7.929297 1.00 3110
##
## Latent trend drift (phi) and AR parameter estimates:
## 2.5% 50% 97.5% Rhat n.eff
## phi 0 0 0 NaN 0
## ar1 0 0 0 NaN 0
## ar2 0 0 0 NaN 0
## ar3 0 0 0 NaN 0
##
summary(mod2$mgcv_model)
##
## Family: Negative Binomial(225134521.017)
## Link function: log
##
## Formula:
## y ~ s(season, bs = c("cc"), k = 12) + s(year, bs = "bs", m = c(3,
## 2, 1, 0)) + ti(season, year, bs = c("cc", "tp"), k = c(12,
## 3), m = c(2, 1))
##
## Parametric coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 5.390950 0.006994 770.8 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df F p-value
## s(season) 7.5621 10 35.354 <2e-16 ***
## s(year) 0.9996 7 348.271 <2e-16 ***
## ti(season,year) 1.4662 10 0.376 0.0635 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.989 Deviance explained = 98.9%
## fREML = 135.56 Scale est. = 1 n = 99
This model explains even more of the variation than the thin plate yearly model above, so we’d be tempted to use it for prediction (though of course we’d want to perform a series of checks following advice from Simon Wood; see lectures by Gavin Simpson for more information on how to perform these checks). So how do the forecasts look?
plot_mvgam_fc(mod2, series = 1, data_test = fake_data$data_test)
Again the forecast is being driven primarily by the extrapolation behaviour of the B spline. Look at this behaviour for the year smooth as we did above by feeding new data for prediction
plot_mvgam_smooth(mod2, series = 1, "year",
newdata = expand.grid(year = seq(min(fake_data$data_train$year),
max(fake_data$data_test$year), length.out = 500),
season = mean(fake_data$data_train$season),
series = unique(fake_data$data_train$series)))
abline(v = max(fake_data$data_train$year),
lty = "dashed")
Residual and posterior in-training predictive checks for this model will look similar to the above, with some autocorrelation left in residuals but nothing terribly alarming. But the forecast checks again show some problems:
plot_mvgam_ppc(mod2, series = 1, type = "density",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod2, series = 1, type = "mean",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod2, series = 1, type = "cdf",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod2, series = 1, type = "pit",
data_test = fake_data$data_test)
Can we improve the forecasts by removing our reliance on extrapolation? Now we will fit a model in which the GAM component of the linear predictor captures the repeated seasonality (again with a cyclic smooth) and a dynamic latent trend captures the residual process using AR parameters (up to order 3). This model is a mildly modified version of the base mgcv model where the linear predictor is augmented with the latent trend component. Slightly longer burnin is used here due to the added complexity of the time series component, but the model still fits in ~ 30 seconds on most machines
mod3 <- mvjagam(data_train = fake_data$data_train,
data_test = fake_data$data_test, formula = y ~
s(season, bs = c("cc"), k = 12) +
ti(season, year, bs = c("cc",
"tp"), k = c(12, 3), m = c(2,
1)), knots = list(season = c(0.5,
12.5)), family = "nb", trend_model = "AR3",
drift = TRUE, burnin = 2000)
## Compiling rjags model...
## Starting 2 rjags simulations using a PSOCK cluster with 2 nodes on host
## 'localhost'
## Simulation complete
## Note: Summary statistics were not produced as there are >50 monitored
## variables
## [To override this behaviour see ?add.summary and ?runjags.options]
## FALSEFinished running the simulation
## NOTE: Stopping adaptation
In this case the fitted model is more different to the base mgcv model that was used in jagam to produce the skeleton JAGS file, so the summary of that base model is less accurate. But we can still check the model summary for the mvjagam mdoel to examine convergence for key parameters
summary_mvgam(mod3)
## GAM formula:
## y ~ s(season, bs = c("cc"), k = 12) + ti(season, year, bs = c("cc",
## "tp"), k = c(12, 3), m = c(2, 1))
##
## Family:
## Negative Binomial
##
## N series:
## 1
##
## N observations per series:
## 108
##
## Dispersion parameter estimate:
## 2.5% 50% 97.5% Rhat n.eff
## r 1225.863 2979.429 6531.063 1 3293
##
## GAM smooth term approximate significances:
## edf Ref.df F p-value
## s(season) 6.578e+00 1.000e+01 2.826 6.4e-06 ***
## ti(season,year) 3.704e-04 2.000e+01 0.000 0.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## (Intercept) 4.6728290001 4.773051e+00 4.8515441996 1.07 21
## s(season).1 -0.1274168484 -7.557710e-02 -0.0322388671 1.04 222
## s(season).2 -0.0855679985 -3.833443e-02 0.0031169653 1.03 277
## s(season).3 -0.0027684215 3.684451e-02 0.0789471347 1.01 342
## s(season).4 -0.0156711375 2.680171e-02 0.0679867653 1.00 285
## s(season).5 0.0648402817 1.056747e-01 0.1449892750 1.00 335
## s(season).6 0.1987350252 2.361874e-01 0.2769362945 1.00 309
## s(season).7 0.1712034476 2.057576e-01 0.2427891287 1.00 420
## s(season).8 -0.0050708784 3.921133e-02 0.0768434216 1.01 345
## s(season).9 -0.2116957992 -1.627073e-01 -0.1161797351 1.01 224
## s(season).10 -0.2085275540 -1.635219e-01 -0.1187665428 1.06 256
## ti(season,year).1 -0.0006316098 2.645038e-05 0.0006818135 1.23 152
## ti(season,year).2 -0.0006698604 1.206314e-05 0.0007238186 1.30 127
## ti(season,year).3 -0.0008455117 3.017815e-05 0.0006283579 1.14 93
## ti(season,year).4 -0.0005972379 -1.524505e-05 0.0008272758 1.18 81
## ti(season,year).5 -0.0008607444 -2.677395e-05 0.0005779525 1.23 105
## ti(season,year).6 -0.0006350234 1.825672e-05 0.0006602766 1.12 118
## ti(season,year).7 -0.0004647063 5.384553e-05 0.0007654281 1.10 98
## ti(season,year).8 -0.0007729944 -4.428847e-05 0.0005118114 1.13 92
## ti(season,year).9 -0.0006869520 -4.313527e-05 0.0007214887 1.17 93
## ti(season,year).10 -0.0007621933 4.686870e-05 0.0007039158 1.21 99
## ti(season,year).11 -0.0006157635 -1.834030e-05 0.0007251982 1.39 101
## ti(season,year).12 -0.0008129181 7.541946e-06 0.0005457591 1.46 104
## ti(season,year).13 -0.0007144632 -3.400048e-05 0.0006688236 1.17 69
## ti(season,year).14 -0.0006526881 4.072160e-05 0.0006291402 1.15 108
## ti(season,year).15 -0.0006026052 1.157521e-05 0.0008550465 1.14 78
## ti(season,year).16 -0.0008697351 -2.785706e-06 0.0004985807 1.09 93
## ti(season,year).17 -0.0005161749 4.301149e-05 0.0007274368 1.09 92
## ti(season,year).18 -0.0008453013 -3.985252e-05 0.0005518880 1.15 88
## ti(season,year).19 -0.0006255317 7.646535e-05 0.0007005311 1.06 138
## ti(season,year).20 -0.0007522235 -7.670758e-05 0.0004896139 1.11 99
##
## GAM smoothing parameter (rho) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## s(season) 6.091034 7.262372 8.334506 1.00 455
## ti(season,year) -5.311781 -1.794381 -0.150464 1.00 3297
## ti(season,year)2 13.332763 15.616935 18.042854 2.28 18
##
## Latent trend drift (phi) and AR parameter estimates:
## 2.5% 50% 97.5% Rhat n.eff
## phi 0.01149027 0.02354465 0.03781871 1.00 205
## ar1 -0.09814880 0.30553300 0.69806295 1.01 672
## ar2 -0.09541980 0.32699493 0.70861299 1.01 1212
## ar3 -0.03051617 0.37553512 0.74122442 1.00 832
##
The seasonal term is obviously still very important. Plot it here
plot_mvgam_smooth(mod3, series = 1, "season")
Plot diagnostics of posterior median Dunn-Smyth residuals, where the autocorrelation is now captured by the latent trend process
plot_mvgam_resids(mod3, series = 1)
How does the model’s posterior forecast distribution compare to the previous models?
plot_mvgam_fc(mod3, series = 1, data_test = fake_data$data_test)
plot_mvgam_ppc(mod3, series = 1, type = "density",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod3, series = 1, type = "mean",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod3, series = 1, type = "cdf",
data_test = fake_data$data_test)
plot_mvgam_ppc(mod3, series = 1, type = "pit",
data_test = fake_data$data_test)
None of the model’s is a perfect representation of the data generating process, but the forecast for our dynamic GAM is more stable and more accurate than the previous models, with the added advantage that we can place more trust our estimated smooth for season because we have captured the residual autocorrelation. The posterior checks for our dynamic GAM also look much better than the two previous models. Plot the estimated latent dynamic trend (which is on the log scale)
plot_mvgam_trend(mod3, series = 1, data_test = fake_data$data_test)
Benchmarking against “null” models is a very important part of evaluating a proposed forecast model. After all, if our complex dynamic model can’t generate better predictions then a random walk or mean forecast, is it really telling us anything new about the data-generating process? Here we examine the model comparison utilities in mvgam. Here we illustrate how this can be done in mvgam by fitting a simpler model by smoothing on a white noise covariate rather than on the seasonal variable. Because the white noise covariate is not informative and we are using a random walk for the trend process, this model essentially becomes a Poisson observation model over a dynamic random walk process.
fake_data$data_train$fake_cov <- rnorm(NROW(fake_data$data_train))
fake_data$data_test$fake_cov <- rnorm(NROW(fake_data$data_test))
mod4 <- mvjagam(data_train = fake_data$data_train,
data_test = fake_data$data_test, formula = y ~
s(fake_cov, k = 3), family = "poisson",
trend_model = "RW", drift = TRUE, burnin = 3000)
## Compiling rjags model...
## Starting 2 rjags simulations using a PSOCK cluster with 2 nodes on host
## 'localhost'
## Simulation complete
## Note: Summary statistics were not produced as there are >50 monitored
## variables
## [To override this behaviour see ?add.summary and ?runjags.options]
## FALSEFinished running the simulation
## NOTE: Stopping adaptation
Look at this model’s proposed forecast
plot_mvgam_fc(mod4, series = 1, data_test = fake_data$data_test)
Here we showcase how different dynamic models can be compared using rolling probabilistic forecast evaluation, which is especially useful if we don’t already have out of sample observations for comparing forecasts. This function sets up a sequence of evaluation timepoints along a rolling window within the training data to evaluate ‘out-of-sample’ forecasts. The trends are rolled forward a total of fc_horizon timesteps according to their estimated state space dynamics to generate an ‘out-of-sample’ forecast that is evaluated against the true observations in the horizon window. We are therefore simulating a situation where the model’s parameters had already been estimated but we have only observed data up to the evaluation timepoint and would like to generate forecasts that consider the possible future paths for the latent trends and the true observed values for any other covariates in the horizon window. Evaluation involves calculating the Discrete Rank Probability Score and a binary indicator for whether or not the true value lies within the forecast’s 90% prediction interval. For this test we compare the two models on the exact same sequence of 30 evaluation points using horizon = 6
compare_mvgams(model1 = mod3, model2 = mod4,
fc_horizon = 6, n_evaluations = 30, n_cores = 3)
## DRPS summaries per model (lower is better)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## Model 1 2.770723 4.167924 4.912382 5.778543 6.320312 18.15886 14
## Model 2 4.282589 10.640313 18.055425 23.683450 27.789384 102.91668 14
##
## 90% interval coverages per model (closer to 0.9 is better)
## Model 1 1
## Model 2 0.9578313
The series of plots generated by compare_mvgams clearly show that the first dynamic model generates better predictions. In each plot, DRPS for the forecast horizon is lower for the first model than for the second model. This kind of evaluation is often more appropriate for forecast models than complexity-penalising fit metrics such as AIC or BIC However, comparing forecasts of the dynamic models against the two models with yearly smooth terms (mod1 and mod2) using the rolling window approach is actually not recommended, as the yearly smooth models have already seen all the possible in-sample values of year and so should be able to predict incredibly well by interpolating through the range of the fitted smooth. By contrast, the dynamic component in our dynamic GAMs (models 3 and 4) produce true forecasts when running the rolling window approach. Nevertheless, when we compare some of these models (here mod1 with the thin plate yearly smooth vs mod3 with the AR3 trend process) as we did above for the random walk model, we still find that our dynamic GAM produces superior probabilistic forecasts
compare_mvgams(model1 = mod1, model2 = mod3,
fc_horizon = 6, n_evaluations = 30, n_cores = 3)
## DRPS summaries per model (lower is better)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## Model 1 69.21863 110.6583 148.141057 151.972669 184.149556 297.08849 14
## Model 2 2.86667 4.1466 4.932769 5.842487 6.476714 18.88043 14
##
## 90% interval coverages per model (closer to 0.9 is better)
## Model 1 1
## Model 2 1
The same holds true when comparing against the B spline model (mod2)
compare_mvgams(model1 = mod2, model2 = mod3,
fc_horizon = 6, n_evaluations = 30, n_cores = 3)
## DRPS summaries per model (lower is better)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## Model 1 70.505228 111.679453 147.944950 153.280244 186.00657 293.75193 14
## Model 2 2.777624 4.079587 4.949371 5.821564 6.40755 18.89155 14
##
## 90% interval coverages per model (closer to 0.9 is better)
## Model 1 1
## Model 2 0.9939759
Now we proceed by exploring how forecast distributions from an mvgam object can be automatically updated in light of new incoming observations. This works by generating a set of “particles” that each captures a unique proposal about the current state of the system (in this case, the current estimate of the latent trend component). The next observation in data_assim is assimilated and particles are weighted by how well their proposal (i.e. their proposed forecast, prior to seeing the new data) matched the new observations. For univariate models such as the ones we’ve fitted so far, this weight is represented by the proposal’s Negative Binomial log-likelihood. For multivariate models, a multivariate composite likelihood is used for weights. Once weights are calculated, we use importance sampling to update the model’s forecast distribution for the remaining forecast horizon. Begin by initiating a set of 5000 particles by assimilating the next observation in data_test and storing the particles in the default location (in a directory called particles within the working directory)
pfilter_mvgam_init(object = mod3, n_particles = 5000,
n_cores = 2, data_assim = fake_data$data_test)
## Saving particles to pfilter/particles.rda
## ESS = 31.09681
Now we are ready to run the particle filter. This function will assimilate the next six out of sample observations in data_test and update the forecast after each assimilation step. This works in an iterative fashion by calculating each particle’s weight, then using a kernel smoothing algorithm to “pull” low weight particles toward the high-likelihood space before assimilating the next observation. The strength of the kernel smoother is controlled by kernel_lambda, which in our experience works well when left to the default of 1. If the Effective Sample Size of particles drops too low, suggesting we are putting most of our belief in a very small set of particles, an automatic resampling step is triggered to increase particle diversity and reduce the chance that our forecast intervals become too narrow and incapable of adapting to changing conditions
pfilter_mvgam_online(data_assim = fake_data$data_test[1:7,
], n_cores = 2, kernel_lambda = 1)
## Particles have already assimilated one or more observations. Skipping these
##
## Assimilating the next 6 observations
##
## Effective sample size is 13.33717 ...
##
## Smoothing particles ...
##
## Effective sample size is 165.15 ...
##
## Smoothing particles ...
##
## Effective sample size is 422.1291 ...
##
## Smoothing particles ...
##
## Effective sample size is 1661.858 ...
##
## Smoothing particles ...
##
## Effective sample size is 5000 ...
##
## Smoothing particles ...
##
## Effective sample size is 5000 ...
##
## Smoothing particles ...
##
## Last assimilation time was 7 1958
##
## Saving particles to pfilter/particles.rda
## ESS = 5000
Once assimilation is complete, generate the updated forecast from the particles using the covariate information in remaining data_test observations. This function is designed to hopefully make it simpler to assimilate observations, as all that needs to be provided once the particles are initiated as a dataframe of test data in exactly the same format as the data that were used to train the initial model. If no new observations are found (observations are arranged by year and then by season so the consistent indexing of these two variables is very important!) then the function returns a NULL and the particles remain where they are in state space.
fc <- pfilter_mvgam_fc(file_path = "pfilter",
n_cores = 2, data_test = fake_data$data_test,
ylim = c(min(fake_data$data_train$y,
na.rm = T), max(fake_data$data_test$y,
na.rm = T) * 1.25))
Compare the updated forecast to the original forecast to see how it has changed in light of the most recent observations
par(mfrow = c(1, 2))
plot_mvgam_fc(mod3, series = 1, data_test = fake_data$data_test,
ylim = c(min(fake_data$data_train$y,
na.rm = T), max(fake_data$data_test$y,
na.rm = T) * 1.25))
fc$Air()
Here it is apparent that the distribution has shifted slightly in light of the 6 observations that have been assimilated, and that our confidence in the remaining forecast horizon has improved (tighter uncertainty intervals). This is an advantageous way of allowing a model to slowly adapt to new conditions while breaking free of restrictive assumptions about residual distributions. See some of the many particle filtering lectures by Nathaniel Osgood for more details. Remove the particles from their stored directory when finished
unlink("pfilter", recursive = T)
#Lynx example For our next univariate example, we will again pursue how challenging it can be to forecast ahead with conventional GAMs and how mvgam overcomes these challenges. We begin by replicating the lynx analysis from 2018 Ecological Society of America workshop on GAMs that was hosted by Eric Pedersen, David L. Miller, Gavin Simpson, and Noam Ross, with some minor adjustments. First, load the data and plot the series as well as its estimated autocorrelation function
library(mvgam)
data(lynx)
lynx_full = data.frame(year = 1821:1934,
population = as.numeric(lynx))
plot(lynx_full$population, type = "l", ylab = "Lynx trappings",
xlab = "Time")
acf(lynx_full$population, main = "")
There is a clear ~19-year cyclic pattern to the data, so I create a season term that can be used to model this effect and give a better representation of the data generating process. I also create a new year term that represents which long-term cycle each observation is in
plot(stl(ts(lynx_full$population, frequency = 19),
s.window = "periodic"))
lynx_full$season <- (lynx_full$year%%19) +
1
cycle_ends <- c(which(lynx_full$season ==
19), NROW(lynx_full))
cycle_starts <- c(1, cycle_ends[1:length(which(lynx_full$season ==
19))] + 1)
cycle <- vector(length = NROW(lynx_full))
for (i in 1:length(cycle_starts)) {
cycle[cycle_starts[i]:cycle_ends[i]] <- i
}
lynx_full$year <- cycle
Add lag indicators needed to fit the nonlinear lag models that gave the best one step ahead point forecasts in the ESA workshop example. As in the example, we specify the default argument in the lag function as the mean log population.
mean_pop_l = mean(log(lynx_full$population))
lynx_full = dplyr::mutate(lynx_full, popl = log(population),
lag1 = dplyr::lag(popl, 1, default = mean_pop_l),
lag2 = dplyr::lag(popl, 2, default = mean_pop_l),
lag3 = dplyr::lag(popl, 3, default = mean_pop_l),
lag4 = dplyr::lag(popl, 4, default = mean_pop_l),
lag5 = dplyr::lag(popl, 5, default = mean_pop_l),
lag6 = dplyr::lag(popl, 6, default = mean_pop_l))
For mvgam models, the response needs to be labelled y and we also need an indicator of the series name as a factor variable
lynx_full$y <- lynx_full$population
lynx_full$series <- factor("series1")
Split the data into training (first 40 years) and testing (next 10 years of data) to evaluate multi-step ahead forecasts
lynx_train = lynx_full[1:40, ]
lynx_test = lynx_full[41:50, ]
The best-forecasting model in the course was with nonlinear smooths of lags 1 and 2; we use those here is that we also include a cyclic smooth for the 19-year cycles as this seems like an important feature, as well as a yearly smooth for the long-term trend. Following the information about spline extrapolation above, we again fit a cubic B spline for the trend with a mix of penalties to try and reign in wacky extrapolation behaviours, and we extend the penalty to cover the years that we wish to predict. This will hopefully give us better uncertainty estimates for the forecast. In this example we assume the observations are Poisson distributed
lynx_mgcv = gam(population ~ s(season, bs = "cc",
k = 19) + s(year, bs = "bs", m = c(3,
2, 1, 0)) + s(lag1, k = 5) + s(lag2,
k = 5), knots = list(season = c(0.5,
19.5), year = c(min(lynx_train$year -
1), min(lynx_train$year), max(lynx_test$year),
max(lynx_test$year) + 1)), data = lynx_train,
family = "poisson", method = "REML")
Inspect the model’s summary and estimated smooth functions for the season, year and lag terms
summary(lynx_mgcv)
##
## Family: poisson
## Link function: log
##
## Formula:
## population ~ s(season, bs = "cc", k = 19) + s(year, bs = "bs",
## m = c(3, 2, 1, 0)) + s(lag1, k = 5) + s(lag2, k = 5)
##
## Parametric coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) 6.666631 0.007511 887.6 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Approximate significance of smooth terms:
## edf Ref.df Chi.sq p-value
## s(season) 16.229 17.000 6770.2 <2e-16 ***
## s(year) 1.990 2.000 244.2 <2e-16 ***
## s(lag1) 3.984 3.999 712.0 <2e-16 ***
## s(lag2) 3.892 3.993 488.1 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## R-sq.(adj) = 0.967 Deviance explained = 97.7%
## -REML = 880.28 Scale est. = 1 n = 40
plot(lynx_mgcv, select = 1)
plot(lynx_mgcv, select = 2)
plot(lynx_mgcv, select = 3)
plot(lynx_mgcv, select = 4)
This model captures most of the deviance in the series and the functions are all confidently estimated to be non-zero and non-flat. So far, so good. Now for some forecasts for the out of sample period. First we must take posterior draws of smooth beta coefficients to incorporate the uncertainties around smooth functions when simulating forecast paths
coef_sim <- gam.mh(lynx_mgcv)$bs
Now we define a function to perform forecast simulations from the nonlinear lag model in a recursive fashion. Using starting values for the last two lags, the function will iteratively project the path ahead with a random sample from the model’s coefficient posterior
recurse_nonlin = function(model, lagged_vals,
h) {
# Initiate state vector
states <- rep(NA, length = h + 2)
# Last two values of the conditional
# expectations begin the state vector
states[1] <- as.numeric(exp(lagged_vals[2]))
states[2] <- as.numeric(exp(lagged_vals[1]))
# Get a random sample of the smooth
# coefficient uncertainty matrix to use
# for the entire forecast horizon of this
# particular path
gam_coef_index <- sample(seq(1, NROW(coef_sim)),
1, T)
# For each following timestep,
# recursively predict based on the
# predictions at each previous lag
for (t in 3:(h + 2)) {
# Build the GAM linear predictor matrix
# using the two previous lags of the
# (log) density
newdata <- data.frame(lag1 = log(states[t -
1] + 0.01), lag2 = log(states[t -
2] + 0.01), season = lynx_test$season[t -
2], year = lynx_test$year[t -
2])
colnames(newdata) <- c("lag1", "lag2",
"season", "year")
Xp <- predict(model, newdata = newdata,
type = "lpmatrix")
# Calculate the posterior prediction for
# this timepoint
mu <- rpois(1, lambda = exp(Xp %*%
coef_sim[gam_coef_index, ]))
# Fill in the state vector and iterate to
# the next timepoint
states[t] <- mu
}
# Return the forecast path
states[-c(1:2)]
}
Create the GAM’s forecast distribution by generating 1000 simulated forecast paths. Each path is fed the true observed values for the last two lags of the first out of sample timepoint, but they can deviate when simulating ahead depending on their particular draw of possible coefficients. Note, this is a bit slow and could easily be parallelised to speed up computations
gam_sims <- matrix(NA, nrow = 1000, ncol = 10)
for (i in 1:1000) {
gam_sims[i, ] <- recurse_nonlin(lynx_mgcv,
lagged_vals = c(lynx_test$lag1[1],
lynx_test$lag2[1]), h = 10)
}
Plot the mgcv model’s out of sample forecast for the next 10 years ahead
cred_ints <- apply(gam_sims, 2, function(x) hpd(x,
0.95))
yupper <- max(cred_ints) * 1.25
plot(cred_ints[3, ] ~ seq(1:NCOL(cred_ints)),
type = "l", col = rgb(1, 0, 0, alpha = 0),
ylim = c(0, yupper), ylab = "Predicted lynx trappings",
xlab = "Forecast horizon", main = "mgcv")
polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))),
c(cred_ints[1, ], rev(cred_ints[3, ])),
col = rgb(150, 0, 0, max = 255, alpha = 100),
border = NA)
cred_ints <- apply(gam_sims, 2, function(x) hpd(x,
0.68))
polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))),
c(cred_ints[1, ], rev(cred_ints[3, ])),
col = rgb(150, 0, 0, max = 255, alpha = 180),
border = NA)
lines(cred_ints[2, ], col = rgb(150, 0, 0,
max = 255), lwd = 2, lty = "dashed")
points(lynx_test$population[1:10], pch = 16)
lines(lynx_test$population[1:10])
A decent forecast? The shape is certainly correct, but the 95% uncertainty intervals appear to be far too wide (i.e. our upper interval extends to up to 8 times the maximum number of trappings that have ever been recorded up to this point). This is almost entirely due to the extrapolation behaviour of the B spline, as the lag smooth functions are not encountering values very far outside the ranges they’ve already been trained on so they are resorting mostly to interpolation. But a better way to evaluate than simply using visuals is to calculate a probabilistic score. Here we use the Discrete Rank Probabiilty Score, which gives us an indication of how well calibrated our forecast’s uncertainty intervals are by comparing the mass of the forecast density against the true observed values. Forecasts with overly wide intervals are penalised, as are forecasts with overly narrow intervals that do not contain the true observations. At the same time we calculate coverage of the forecast’s 90% intervals, which is another useful way of evaluating different forecast proposals
# Discrete Rank Probability Score and
# coverage of 90% interval
drps_score <- function(truth, fc, interval_width = 0.9) {
nsum <- 1000
Fy = ecdf(fc)
ysum <- 0:nsum
indicator <- ifelse(ysum - truth >= 0,
1, 0)
score <- sum((indicator - Fy(ysum))^2)
# Is value within 90% HPD?
interval <- hpd(fc, interval_width)
in_interval <- ifelse(truth <= interval[3] &
truth >= interval[1], 1, 0)
return(c(score, in_interval))
}
# Wrapper to operate on all observations
# in fc_horizon
drps_mcmc_object <- function(truth, fc, interval_width = 0.9) {
indices_keep <- which(!is.na(truth))
if (length(indices_keep) == 0) {
scores = data.frame(drps = rep(NA,
length(truth)), interval = rep(NA,
length(truth)))
} else {
scores <- matrix(NA, nrow = length(truth),
ncol = 2)
for (i in indices_keep) {
scores[i, ] <- drps_score(truth = as.vector(truth)[i],
fc = fc[, i], interval_width)
}
}
scores
}
# Calculate DRPS over the 10-year horizon
# for the mgcv model
lynx_mgcv1_drps <- drps_mcmc_object(truth = lynx_test$population[1:10],
fc = gam_sims)
What if we remove the yearly trend and let the lag smooths capture more of the temporal dependencies? Will that improve the forecast distribution? Run a second model and plot the forecast (note that this plot will be on quite a different y-axis scale compared to the first plot above)
lynx_mgcv2 = gam(population ~ s(season, bs = "cc",
k = 19) + s(lag1, k = 5) + s(lag2, k = 5),
data = lynx_train, family = "poisson",
method = "REML")
coef_sim <- gam.mh(lynx_mgcv2)$bs
gam_sims <- matrix(NA, nrow = 1000, ncol = 10)
for (i in 1:1000) {
gam_sims[i, ] <- recurse_nonlin(lynx_mgcv2,
lagged_vals = c(lynx_test$lag1[1],
lynx_test$lag2[1]), h = 10)
}
cred_ints <- apply(gam_sims, 2, function(x) hpd(x,
0.95))
yupper <- max(cred_ints) * 1.25
plot(cred_ints[3, ] ~ seq(1:NCOL(cred_ints)),
type = "l", col = rgb(1, 0, 0, alpha = 0),
ylim = c(0, yupper), ylab = "Predicted lynx trappings",
xlab = "Forecast horizon", main = "mgcv")
polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))),
c(cred_ints[1, ], rev(cred_ints[3, ])),
col = rgb(150, 0, 0, max = 255, alpha = 100),
border = NA)
cred_ints <- apply(gam_sims, 2, function(x) hpd(x,
0.68))
polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))),
c(cred_ints[1, ], rev(cred_ints[3, ])),
col = rgb(150, 0, 0, max = 255, alpha = 180),
border = NA)
lines(cred_ints[2, ], col = rgb(150, 0, 0,
max = 255), lwd = 2, lty = "dashed")
points(lynx_test$population[1:10], pch = 16)
lines(lynx_test$population[1:10])
Calculate DRPS over the 10-year horizon for the second mgcv model
lynx_mgcv2_drps <- drps_mcmc_object(truth = lynx_test$population[1:10],
fc = gam_sims)
This forecast is highly overconfident, with very unrealistic uncertainty intervals due to the interpolation behaviours of the lag smooths. You can certainly keep trying different formulations (our experience is that the B spline variant above produces the best forecasts from any tested mgcv model, but we did not test an exhaustive set), but hopefully it is clear that forecasting using splines is tricky business and it is likely that each time you do it you’ll end up honing in on different combinations of penalties, knot selections etc…. Now we will fit an mvgam model for comparison. This model fits a similar model to the mgcv model directly above but with a full time series model for the errors (in this case an AR1 process), rather than smoothing splines that do not incorporate a concept of the future. We do not use a year term to reduce any possible extrapolation and because the latent dynamic component should capture this temporal variation. We estimate the model in JAGS using MCMC sampling (Note that JAGS 4.3.0 is required; installation links are found here)
lynx_mvgam <- mvjagam(data_train = lynx_train,
data_test = lynx_test, formula = y ~
s(season, bs = "cc", k = 19), knots = list(season = c(0.5,
19.5)), family = "poisson", trend_model = "AR1",
burnin = 5000)
## Compiling rjags model...
## Starting 2 rjags simulations using a PSOCK cluster with 2 nodes on host
## 'localhost'
## Simulation complete
## Note: Summary statistics were not produced as there are >50 monitored
## variables
## [To override this behaviour see ?add.summary and ?runjags.options]
## FALSEFinished running the simulation
## NOTE: Stopping adaptation
Calculate the out of sample forecast from the fitted mvgam model and plot
fits <- MCMCvis::MCMCchains(lynx_mvgam$jags_output,
"ypred")
fits <- fits[, (NROW(lynx_mvgam$obs_data) +
1):(NROW(lynx_mvgam$obs_data) + 10)]
cred_ints <- apply(fits, 2, function(x) hpd(x,
0.95))
yupper <- max(cred_ints) * 1.25
plot(cred_ints[3, ] ~ seq(1:NCOL(cred_ints)),
type = "l", col = rgb(1, 0, 0, alpha = 0),
ylim = c(0, yupper), ylab = "", xlab = "Forecast horizon",
main = "mvgam")
polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))),
c(cred_ints[1, ], rev(cred_ints[3, ])),
col = rgb(150, 0, 0, max = 255, alpha = 100),
border = NA)
cred_ints <- apply(fits, 2, function(x) hpd(x,
0.68))
polygon(c(seq(1:(NCOL(cred_ints))), rev(seq(1:NCOL(cred_ints)))),
c(cred_ints[1, ], rev(cred_ints[3, ])),
col = rgb(150, 0, 0, max = 255, alpha = 180),
border = NA)
lines(cred_ints[2, ], col = rgb(150, 0, 0,
max = 255), lwd = 2, lty = "dashed")
points(lynx_test$population[1:10], pch = 16)
lines(lynx_test$population[1:10])
Calculate DRPS over the 10-year horizon for the mvgam model
lynx_mvgam_drps <- drps_mcmc_object(truth = lynx_test$population[1:10],
fc = fits)
How do the out of sample DRPS scores stack up for these three models? Remember, our goal is to minimise DRPS while providing 90% intervals that are near, but not less than, 0.9. The DRPS and 90% interval coverage for the first mgcv model (with the B spline year term)
sum(lynx_mgcv1_drps[, 1])
## [1] 1569.438
mean(lynx_mgcv1_drps[, 2])
## [1] 0.8
For the second mgcv model
sum(lynx_mgcv2_drps[, 1])
## [1] 2040.159
mean(lynx_mgcv2_drps[, 2])
## [1] 0
And for the mvgam model
sum(lynx_mvgam_drps[, 1])
## [1] 678.0076
mean(lynx_mvgam_drps[, 2])
## [1] 1
The mvgam has much more realistic uncertainty than the mgcv versions above. Of course this is just one out of sample comparison, and to really determine which model is most appropriate for forecasting we would want to run many of these tests using a rolling window approach. Have a look at this model’s summary to see what is being estimated (note that longer MCMC runs would probably be needed to increase effective sample sizes)
summary_mvgam(lynx_mvgam)
## GAM formula:
## y ~ s(season, bs = "cc", k = 19)
##
## Family:
## Poisson
##
## N series:
## 1
##
## N observations per series:
## 40
##
## GAM smooth term approximate significances:
## edf Ref.df Chi.sq p-value
## s(season) 15.68 17.00 147 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## GAM coefficient (beta) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## (Intercept) 5.8389102 6.61683869 6.84601903 2.67 14
## s(season).1 -1.0949739 -0.65523967 -0.09265372 1.09 58
## s(season).2 -0.3219072 0.14576769 0.67356702 1.10 43
## s(season).3 0.4153367 0.86779002 1.43435505 1.95 30
## s(season).4 0.8710172 1.47459224 2.17385633 2.54 22
## s(season).5 1.0371451 1.69773762 2.53888717 2.96 21
## s(season).6 0.4335472 1.15167147 1.75201109 2.67 26
## s(season).7 -0.7669307 -0.04616469 0.69001506 1.47 37
## s(season).8 -1.6821003 -0.91024712 -0.27555546 1.11 40
## s(season).9 -1.6631383 -0.99156729 -0.32269648 1.01 54
## s(season).10 -1.1207783 -0.51322286 0.10243926 1.01 50
## s(season).11 -0.1164903 0.37888684 0.99036640 1.14 55
## s(season).12 0.8627947 1.40821539 1.94474384 1.45 34
## s(season).13 0.9962204 1.62235463 2.09246035 2.03 41
## s(season).14 0.5822655 1.18559811 1.66974318 1.31 32
## s(season).15 -0.3599914 0.16364091 0.64232743 1.18 42
## s(season).16 -1.1464927 -0.58198962 -0.10396872 1.27 77
## s(season).17 -1.4931455 -1.00870911 -0.52643683 1.39 68
##
## GAM smoothing parameter (rho) estimates:
## 2.5% 50% 97.5% Rhat n.eff
## s(season) 3.604389 4.470702 5.225043 1.01 323
##
## Latent trend drift (phi) and AR parameter estimates:
## 2.5% 50% 97.5% Rhat n.eff
## phi 0.0000000 0.0000000 0.000000 NaN 0
## ar1 0.4704539 0.7099536 0.929785 1.37 477
## ar2 0.0000000 0.0000000 0.000000 NaN 0
## ar3 0.0000000 0.0000000 0.000000 NaN 0
##
Now inspect each model’s estimated smooth for the 19-year cyclic pattern. Note that the mvgam smooth plot is on a different scale compared to the mgcv plot, but interpretation is similar. The mgcv smooth is much wigglier, likely because it is compensating for any remaining autocorrelation not captured by the lag smooths. We could probably remedy this by reducing k in the seasonal smooth for the mgcv model (in practice this works well, but leaving k larger for the mvgam’s seasonal smooth is recommended as our experience is that this tends to lead to better performance and convergence)
plot(lynx_mgcv, select = 1, shade = T)
plot_mvgam_smooth(lynx_mvgam, 1, "season")
We can also view the mvgam’s posterior predictions for the entire series (testing and training)
plot_mvgam_fc(lynx_mvgam, data_test = lynx_test)
And the estimated trend
plot_mvgam_trend(lynx_mvgam, data_test = lynx_test)
A key aspect of ecological forecasting is to understand how different components of a model contribute to forecast uncertainty. We can estimate contributions to forecast uncertainty for the GAM smooth functions and the latent trend using mvgam
plot_mvgam_uncertainty(lynx_mvgam, data_test = lynx_test)
Both components contribute to forecast uncertainty, with the trend component contributing more over time (as it should since this is the stochastic forecast component). This suggests we would still need some more work to learn about factors driving the dynamics of the system. But we will leave the model as-is for this example. Diagnostics of the model can also be performed using mvgam. Have a look at the model’s residuals, which are posterior medians of Dunn-Smyth randomised quantile residuals so should follow approximate normality. We are primarily looking for a lack of autocorrelation, which would suggest our AR1 model is appropriate for the latent trend
plot_mvgam_resids(lynx_mvgam)